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Abstract 

We combine a generic method for finding fast orthogonal transforms for a given 
quasi-Monte Carlo integration problem with the multilevel Monte Carlo method. It 
is shown by example that this combined method can vastly improve the efficiency 
of quasi-Monte Carlo. 


1 Introduction 

Many simulation problems from finance and other applied fields can be written in the form 
E(/(V)), where / is a measurable function on M n and X is a standard normal vector, that 
is, X = {X\ ,..., X n ) is jointly normally distributed with E(Vj) = 0 and E (XjXk) = Sjk- 
It is a trivial observation that 

E(/(V)) = E (f(UX)) (1) 

for every orthogonal transform U : M n —> M n . It has been observed in a number of 
articles (dunlin]) that, while this reformulation does not change the simulation problem 
from the probabilistic point of view, it does make a - sometimes big - difference when 
quasi-Monte Carlo (QMC) simulation is applied to generate the realizations of X. 

Prominent examples are supplied by the well-known Brownian bridge [13] and principal 
component analysis (PCA) [T] constructions of Brownian paths which will be detailed in 
the following paragraphs. Assume we want to calculate an approximation to E (g(B)) 
where B is a Brownian motion with index set [0, T]. In most applications this can be 
reasonably approximated by E(g(i?T,..., Bt«)), where g is a corresponding function 

n n 

taking as its argument a discrete Brownian path } by which we mean a normal vector with 
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covariance matrix 


£ := 


T \ n T 

~ min (j, k)) = — 

n / j,k =1 n 
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12 3 

12 3 



There are three classical methods for sampling from (Bt,...,But) given a standard 

n n 

normal vector A", namely the forward method, the Brownian bridge construction and the 
principal component analysis construction. All of these constructions may be written in 
the form (Bt, ..., Bnr) = AX, where A is an n x n real matrix with AA T = £. 

n n 

For example, the matrix corresponding to the forward method is 


A 


( 1 0 ... 0 \ 



\1 1 ••• l J 


( 2 ) 


while PCA corresponds to A = VD, where £ = VD 2 V T is the singular value decompo¬ 
sition of £. A corresponding decomposition for the Brownian bridge algorithm is given, 
for example, in Larcher, Leobacher & Scheicher [ID]. 

It has been observed by Papageorgiou [15] that AA T = £ if and only if A = SU for 
some orthogonal matrix U, so that every linear construction of (Bt, ..., Bnr) corresponds 

n n 

to an orthogonal transform of M n . In that sense the forward method corresponds to the 
identity, PCA corresponds to S~ l VD and Brownian bridge corresponds to the inverse 
Haar transform, see Leobacher m 

Thus our original simulation problem can be written, as 


E (~g(Br ,..., Bt*)) = E(g(SX)) = E (f(X)) 

n n 

with f — g o S. In the context of discrete Brownian paths this corresponds to the 
forward method. Consequently, the same problem using the Brownian bridge takes on 
the form K(f(H~ 1 X)), where H is the matrix of the Haar transform, and has the form 
E(f(S~ 1 VDX)), with S, V, D as above, when PCA is used. 

Papageorgiou ra noted that whether or not the Brownian bridge and PCA construc¬ 
tions enhance the performance of QMC methods depends critically on the integrand / 
and he provides an example of a financial option where those two methods give much 
worse results than the forward method. That lead to the idea of searching for orthogonal 
transform tailored to the integrand. Imai & Tan 0 propose a general technique for this 
problem which they call linear transform (LT) method. 


The exact reason why orthogonal transforms might have the effect to make a problem 
more suitable for QMC is still unknown. Caflish et. al. [3] propose that those transforms 
diminish the so-called effective dimension of the problem. Owen in provided the concept 
of effective dimension of a function space. The least that can be said with confidence is 
that introducing an orthogonal transform does not introduce a bias and that there are 
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choices (like the identity) that make the problem at least equally well suited for QMC as 
the original one. 

While applying a suitable orthogonal transform to an integration problem may increase 
the performance of QMC simulation, there is also a disadvantage: the computation of the 
orthogonal transform incurs a cost, which in general is of the order 0(n 2 ). For large n 
this cost is likely to swallow any gains from the transform. In B2| it is therefore proposed 
to concentrate on orthogonal transforms which have cost of the order 0(nlog(n)) or less. 

Examples of such fast orthogonal transforms include discrete sine and cosine trans¬ 
form, Walsh and (inverse) Haar transform as well as the orthogonal matrix corresponding 
to the PCA, see Scheicher [16j and Leobacher E3 

A relatively recent approach to enhance the efficiency of Monte Carlo simulation has 
been proposed by Giles pE] and Heinrich [7j. They propose a multilevel procedure by 
combining Monte Carlo based on different time discretizations. The improvement in 
computational efficiency by using quasi-Monte Carlo instead of Monte Carlo together 
with the multilevel method is shown in Giles & Waterhouse ||5j where the authors used 
a rank-1 lattice rule with a random shift. Furthermore, they give a short discussion on 
the three classical sampling methods mentioned above. We contribute to the topic by 
Ending an orthogonal transform adapted to the multilevel method, thus making it even 
more efficient. 

The remainder of the paper is organized as follows. Section [2] reviews basic properties 
of Householder reflections and in Section [3] we describe an algorithm for finding a fast 
orthogonal transform using Householder reflections. The main part of our article, Section 
SI recalls some of the basics of multilevel (quasi-)Monte Carlo and discusses how the ideas 
of Section [3] can be carried over to multilevel quasi-Monte Carlo integration. 

Section [5] gives a numerical example where the method described earlier is applied to 
an example from finance. We will see that the method improves the efficency of multilevel 
quasi-Monte Carlo integration. 


2 Householder Reflections 


We recall the definition and basic properties of Householder reflections from Golub & Van 
Loan {6J. 

Definition 2.1. A matrix of the form 


U 


1-2 


vv 

V T V 


where v E is called a Householder reflection. The vector v is called the defining 
Householder vector. 


In the following proposition, e\ denotes the first canonical basis vector in M n , e\ = 

( 1 , 0 ,..., 0 ). 

Proposition 2.2. Householder reflections have the following properties: 


3 



1. Let U be a Householder reflection with Householder vector v. If x EW 1 is a vector 
then Ux is the reflection of x in the hyperplane spanlu}- 1 . In particular, U is 
orthogonal and symmetric, i.e. U~ l = U. 

2. Given any vector a 6 I” we can find v E R n such that for the corresponding 
Householder reflection U we have Ua = ||a||ei. The computation of the Householder 
vector uses 3 n floating point operations. 

3. The computation of Ux uses at most An floating point operations. 

Proof. See Golub & Van Loan [61 Chapter 5.1]. □ 

3 Regression Algorithm 

In this section we give a short description of a rather general method for constructing fast 
and efficient orthogonal transforms. Parts of the material have already been presented in 
|9j, but we include them to make the paper self-contained. 

Let / : R n —» R be a measurable function with E(/(AT) 2 ) < oo for a standard normal 
vector X. Wang & Sloan mi consider functions of the form 

f(X) = g(wJX,...,wlX) (3) 

with Wi,..., w m E W 1 and g : R m —> R. The authors show that for such functions there 
exists an orthogonal transform that reduces the dimension of / to at most m. Therefore 
the integration problem is not as high-dimensional as it seems and we have a convergence 
rate of the QMC algorithm applied to the transformed problem corresponding to m rather 
than n. We give a slightly modified version of their arguments to reduce the dimension of 
/, because we suggest using Householder reflections to generate the orthogonal transform 
which guarantees that the transform can be applied using at most 0{n log(n)) operations 
if m < log(n). 

Assume that w\ is not the zero vector and let U\ : R" —> R n be a Householder reflection 
which maps e\ to wi/||wi||. Then wJU\X = ||u; 1 ||e] r A’ = ||ty 1 ||A’ 1 and therefore 


f{U\X) = g (HIPG, ( U lW2 yx ,..., ( U lWm ) T X). 

Next we write ( UiWk) T X = (UiWkffXi + (Uiw^)J n X 2 ... n . That is, 

f{UiX) = gflXi, wJ tl X 2 ... n ,.. .,wl A X 2 ... n ) 

where w 2 ,i ,..., u> m ,i £ R n ~h Assuming that w 2> i 0, let U 2 : R n_1 —> R 71 ^ 1 be the 
Householder reflection that maps e\ to ry 2 ,i/||^ 2 ,i|| and let 


Uo = 


1 0 
0 U 2 


Then U 2 is a Householder reflection from R n to R n and 

, T 


f{U x U 2 X) = g 2 { Ad, X 2 , w 3 \ 2 X 3 .„ n ,..., wfl 2 X 3 ... n ) 
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with W 3 t 2 , • • •, Wm ,2 £ R" 2 . Proceeding that way one arrives at 

f{Ul ■ • • U*X) = grn(X u X 2 ,..., X*) 

for some m < m (We may have m < m if some transformed Wk are zero). 

In the spirit of HU we propose a procedure for more general integration problems. Let 
us assume that the function / is of the form 

f(x) =g(hi(x),...,hm(x)) 

where m < n, g : R m —> R and hk : R n —> R, k — 1,..., m. We want to approximate 
every hk by a linear function, i.e. 

h k (x) & ajx + b k 

with ak G R n and bk G R. The approximation is done by a “linear regression” approach 
and therefore, for every k = 1,..., m, we minimize 

E ^hk(X) — a^X — bk) 2 ^j —>■ min . 

First order conditions give for k — 1,..., m 

ak,j = E (Xjhk(X )), j = l,...,n; (4) 

6 fc = E(/i fc (X)) . (5) 

Therefore, P ]) -(|5| ) minimizes the variance of the difference between each hk(X ) and its 
linear approximation + bk- So 

Y(h k (X)) =E((h fc (X)-6 fc ) 2 ) 

= E((aIX) 2 + (h fc (X)-6 fc -aIX) 2 ) 

= ||a fc || 2 + ¥(^(X)-aJx) . 

That is, ||a/ c || 2 /V(hfc(A")) measures the fraction of variance captured by the linear approx¬ 
imation. 

Now we approximate the function / by substituting the hk with the linear functions 
obtained by the linear regression, i.e. 

f(x) w g(ajx + &i,..., + b m ) = g(ajx ,..., a^x). 

Therefore / is approximated by a function of the form (j3]) and we can proceed in the same 
way as at the beginning of this section to determine a fast orthogonal transform by using 
Householder reflections. 

Note that the method is only practical if the expectations E (Xjhk{X)) in (j3J) can be 
computed explicitly or at least efficiently. After the statement of the algorithm we will 
give an example where explicit calculation is possible. 

Algorithm 3.1. Let X\,... ,X n be independent standard normal variables. Let f be a 
function f : R n —> R, which is of the form f = g o h where h : R n —y R m and 
g : R m —> R. 
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1. Start with k,£ — 1 and U = I; 

2. ak,j : = ¥,(Xjhk(UX)) for j = k ,... ,n; 

3. a kJ 0 for j — 1,... ,k — 1; 

4- if IKII = 0 go to\2f 

5. else let Ui be a Householder reflection that maps ei to Ofc/IKII/ 

6. U =UU e ; £ = £ + 1; 

7. k — k + 1; 

8. while k < m, go back to (H; 

9. Compute E(f(UX )) using QMC. 

Example 3.2. We give an example from finance for which Algorithm 13.11 can be applied 
efficiently. Motivated by a discrete arithmetic Asian option let us consider 


f(X) = max 


w k exp y ^(c kii Xi + d kii ) 


K k =1 


. i =1 



with Wk,Ck t i,d k j G R. Now we can write f(X) = g{h\{X)) with g(y) = max(r/ — A", 0) 
and hflX) = w k ex P(XKi( c MK + 4,»))- In that case we can compute E(X,Ai(AT)) 
explicitly. It is easily verified that, with f> denoting the standard normal density, <f>(x ) = 

exp(— x72)/KK 


exp(ca: + d)4>{x)dx 


exp(c 2 /2 + d) 


and 


xexp(cx + d)<f>(x)dx = cexp(c 2 /2 + d) 


for any c, d G R. Therefore, we obtain 
a hj = E^XjhflX)) 

= / ... / Xjhi(x)cj)(xi)... 4>(x n )dx i... dx r 


n ,.2 
Cl. 


y w k c k ,j exp 4, 


In [9] it was calculated that for practical parameters ||a|| 2 is typically larger than 0.99 

v(M*)). 


6 



4 Multilevel Quasi-Monte Carlo 


We start with an abstract formulation of the multilevel (quasi-)Monte Carlo method: 
suppose we want to approximate E(Y) for some random variable Y which has finite 
expectation. Suppose further that we have a sequence of sufficiently regular functions 
f e : M. mC —y M such that 


lim E (f(X e )) = E(Y), 

l—>oo 


( 6 ) 


where for each £ > 0, X e denotes an m- dimensional standard normal vector. (161) 
states that there exists a sequence of algorithms which approximate E(Y) with increas¬ 
ing accuracy. For example, if f e (X £ ) has finite variance, we can approximate E(Y) by 
jr J2k=o using sufficiently large £ and N, where (X[) k>o is a sequence of indepen¬ 

dent standard normal vectors. 

Usually, evaluation of f £ (X £ ) becomes more costly with increasing £ and N. Multi¬ 
level methods sometimes help us to save significant proportions of computing time by 
computing more samples for the coarser approximations, which need less computing time 
but have higher variance. 

We will need the following definition in the statement of the multilevel Monte Carlo 
method: for any m G N and any £ e No we call the nU” 1 x mf matrix C m j = 
with 


( Cm,i)i,j 


i 

s/m 

0 


if {i — 1 )m + 1 < j < i m 
else 


the coarsening matrix from level £ to level £ — 1. For example, in the case of m = 2 the 
coarsening matrix is given by 


( ^ 

1 

V2 

0 

0 

0 .. 

. 0 

0 ^ 

0 

0 

1 

V2 

7! 0 ■■ 

. 0 

0 

l 0 

0 

0 

0 

0 .. 

1 

• U2 

75 ) 


The following lemma is simple to verify and therefore we leave the proof to the reader. 

Lemma 4.1. Let m G N. If X e is an m e -dimensional standard normal vector, then 
C m /X e - is an m £ ~ l -dimensional standard normal vector. 


□ 


Obviously we have 

L 

E(Y) « E (. f L (X L )) = E (/°(X 0 )) + ]Te {f(X e )) - E (f^X*- 1 )) 

i =i 

= E (/°(X 0 )) + ^E (f e (X e )) - E (f-\C m/ X e )) 

t=\ 

= E (/°(W 0 )) + ^E (f(X e ) - f-\C m/ X e )) (7) 

i=\ 
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Equation (Q becomes useful if, as is often the case in practice, the expectation 

can be approximated to the required level of accuracy us¬ 
ing less function evaluations for bigger i while the costs per function evaluation increases. 
One typical situation where this occurs is when a stochastic differential equation is solved 
numerically using time discretization with m e time steps and f e is some function on the 
set of solution paths. See [4j for how to exploit this representation. 

In finance, f e is typically of the form f e (X) = ip(h e (X)) for some functions hr : 
R m —> R and ip : R —> R. In that context, h is some function taking as its argument 
a discrete (geometric) Brownian path, like the maximum or the average, and ip is the 
payoff that depends on the outcome of h t . 

E (f L (X L )) = E (f(X°)) + J2 E (V> ( h'(X' )) - </> ((■'-‘(C.,,1'))) 

l=\ 

= E (V- (A"(x°))) + ir E (/(ki(X 1 ), h‘(X 1 ))) , 

1=1 

where h{ = h e , h l 2 = h e ~ l o C mil and g e (yi,y 2 ) = ip{yi) ~ ^(2/2)- 

Now the integrands are precisely of the form covered by Algorithm 13. II It is therefore 
sensible to apply the corresponding orthogonal transform U e : R £ —> R^ at each level 
such that we get 

E(f L (X L )) =E(f\U°X°))+jrE(g £ (h e 1 (U e X e ),h £ 2 (U e X 1 ))) . 

i=\ 

Of course, we are free to try any other set of orthogonal transforms, like PCA. The 
advantage of using the regression algorithm is that here at each level the orthogonal 
transform is determined by taking both the fine and the coarse discretization into account. 

In the next section we shall try our method on a concrete example from finance, the 
Asian option. 

5 Asian Option 

We will consider an Asian call option in the Blaek-Scholes model, i.e. under the risk- 
neutral measure the stock price process S = (S t )t>o is given by the stochastic differential 
equation (SDE) 


dS t = rStdt + <jS t dB t 

where r is the interest rate, a is the volatility and (B t ) t >0 is a standard Brownian motion. 
Given the stock price S 0 at time 0, the solution of the SDE is given by 

S t = Sq exp ((r - a 2 / 2 ) t + aB t ) . 

The payoff of the Asian call option with fixed strike price K, maturity T and underlying 
S is 

max j S t dt — K, O^j . 



That is, if the average stock price over the time interval [0, T] is above level K, the option 
pays its holder at time T the difference between that value and K, otherwise it pays 
nothing. 

Martingale pricing theory tells us that the price of the option is given by the discounted 
expectation of the payoff function under the risk-neutral measure, see Bjork [2j Chapter 
10], i.e. 


C = exp(—rT)E (max J S t dt — K , 0 

To approximate C in the time-continuous model, a common way is to use (multilevel) 
quasi-Monte Carlo integration to compute the expectation. To that end we first approxi¬ 
mate the integral by a sum: For any equidistant time discretization with n G N points, 



1 

T 



1 n 


where X = (Ad,..., X n ) is a standard normal vector and 

S k (X) = So exp (l^r - y ^ Ay + \ , k = 1,... ,n. 

Therefore the payoff function of the Asian option is approximately 

f(X) = max ^ ~ K, 0 j . (8) 

If the time discretization consists of i = m e points with £ e N 0 , m 6 N, we denote the 
payoff function by and it is therefore given by (JSJ) with n = m £ . 

Thus we can approximate the price C using multilevel QMC integration with finest 
level L by 


C « exp(-rT) E (/°(X 0 )) + ^E (f(X £ ) - f-\C m/ X £ )) 

\ k= 1 

<-p(-rT) (E(/°(X“)) +J]E( 9 '(ft'(X'),h' 2 (Jf'))) 


= exr 


k =1 


where g t (y 1 , y 2 ) = max(?/i, 0) - ma x(y 2 , 0), 


h[(X e ) 



and 


h l 2 (X e ) 


1 


m l ~ l 


S 0 exp 

k =1 



2 


k 


T 

m £_1 
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In applying Algorithm 13. II we have to compute the vectors a ], a e 2 for each level. This can 
be done as in Example 13.21 For £ = 1,..., L and j — 1..., m e we get 



E 

H^J +1 


Sp* 

m l ~ l 




Now we compare the multilevel QMC method combined with the regression algorithm 
with multilevel Monte Carlo and multilevel quasi-Monte Carlo (forward and PCA sam¬ 
pling) numerically. For that we choose the parameters as r = 0.04, a = 0.3, Sp = 100, 
K = 100 and T = 1. At the finest level we start with 2 10 discretization points and at each 
coarser level we divide in half the number of points, i.e. L = 10 and m = 2. Furthermore, 
the number of sample points are doubled at each level starting with Nj j sample points 
at the finest level L. For the QMC approaches we take a Sobol sequence with a random 
shift. In Table [I] we compare for different values N L both the average and the standard 
deviation of the price of the Asian call option based on 1000 independent runs. Moreover, 
the average computing time for one run is given in brackets. As we can see, the regression 
algorithm yields the lowest standard deviation, but the computing time of the regression 
algorithm is slightly worse than the forward method. However, the regression algorithm 
is better than the PCA construction measured in both standard deviation and computing 
time. 

In Table [2] we compare the regression algorithm both for multilevel QMC and for QMC 
with 2 10 time steps (L = 10). We can observe that the standard deviation as well as the 
computing time of the multilevel QMC setting is significantly better compared with crude 
QMC. 
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multilevel 

Monte Carlo 

forward 

multilevel QMC 
PCA 

regression 

N l 

average stddev 

average 

stddev 

average stddev 

average stddev 

2 

7.717 0.41x10° 

(0.0057 s) 

7.735 0.19 xlO" 1 

(0.0057 s) 

7.736 0.16 x 10" 1 

(0.0088 s) 

7.739 0.10 xlO" 1 

(0.0069 s) 

4 

7.738 0.19x10° 

(0.0074 s) 

7.734 0.71 xlO" 2 

(0.0074 s) 

7.736 0.44 xlO" 2 

(0.0118 s) 

7.738 0.29 xlO" 2 

(0.0091 s) 

8 

7.748 0.54 xlO" 1 

(0.0101s) 

7.737 0.30 xlO" 2 

(0.0100 s) 

7.737 0.14x 10" 2 

(0.0165 s) 

7.736 0.10 xlO" 2 

(0.0124 s) 

16 

7.746 0.40 x 10” 1 

(0.0157 s) 

7.736 | 0.11 xlO" 2 
(0.0157 s) 

7.737 0.69 xlO" 3 

(0.0279 s) 

7.736 0.30 xlO" 3 

(0.0194 s) 

32 

7.728 0.31 x 10" 1 

(0.0266 s) 

7.736 0.49 xlO" 3 

(0.0265 s) 

7.737 0.21 xlO" 3 

(0.0585 s) 

7.736 0.10 xlO" 3 

(0.0326 s) 

64 

7.739 0.81 xlO" 2 

(0.0486 s) 

7.736 0.20 xlO" 3 

(0.0484 s) 

7.737 0.69 xlO" 4 

(0.1202 s) 

7.737 0.32 xlO" 4 

(0.0583 s) 


Table 1: Multilevel (Q)MC using 2 10 time steps (L = 10). The average and the standard 
deviation of the option price are based on 1000 runs. The average computing time is given 
in brackets. 




average 

stddev 

time (s) 

MLQMC - Regression 

(N l = 2 6 ) 

7.7366 

0.32 xlO" 4 

0.0323 

QMC - Regression 

(N = 2 12 ) 

7.7362 

1.01 xlO" 4 

0.1511 


Table 2: QMC and multilevel QMC, both combined with the regression algorithm, with 
2 10 time steps (L = 10) based on 1000 runs. 
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